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Abstract — This work investigates tliree aspects: (a) a network 
vulnerability as the non-uniform vulnerable-host distribution, (b) 
threats, i.e., intelligent malwares that exploit such a vulnerability, 
and (c) defense, i.e., challenges for fighting the threats. We 
first study five large data sets and observe consistent clustered 
vulnerable-host distributions. We then present a new metric, 
referred to as the non-uniformity factor, which quantifies the 
unevenness of a vulnerable-host distribution. This metric is 
essentially the Renyi information entropy and better charac- 
terizes the non-uniformity of a distribution than the Shannon 
entropy. Next, we analyze the propagation speed of network- 
aware malwares in view of information theory. In particular, we 
draw a relationship between Renyi entropies and randomized 
epidemic malware-scanning algorithms. We find that the infection 
rates of malware-scanning methods are characterized by the 
Renyi entropies that relate to the information bits in a non- 
unform vulnerable-host distribution extracted by a randomized 
scanning algorithm. Meanwhile, we show that a representative 
network-aware malware can increase the spreading speed by 
exactly or nearly a non-uniformity factor when compared to 
a random-scanning malware at an early stage of malware 
propagation. This quantifies that how much more rapidly the 
Internet can be infected at the early stage when a malware 
exploits an uneven vulnerable-host distribution as a network- 
wide vulnerability. Furthermore, we analyze the effectiveness of 
defense strategies on the spread of network-aware malwares. Our 
results demonstrate that counteracting network-aware malwares 
is a significant challenge for the strategies that include host-based 
defense and IPv6. 



I. Introduction 

Malware attacks are a significant threat to networks. Mal- 
wares are malicious softwares that include worms, viruses, 
bots, and spywares. A fundamental characteristic of malwares 
is self-propagation, i.e., a malware can infect vulnerable hosts 
and use infected hosts to self-disseminate. A key component 
of malware propagation is malware-scanning methods, i.e., 
how effectively and efficiently the malware finds vulnerable 
targets. Most of the real, especially old worms, such as 
Code Red [16], Slammer |17|, and latter Witty 124J, exploit 
naive random scanning. Random scanning chooses target 
IP addresses uniformly and does not take any information 
on network structures into consideration. Advanced scanning 
methods, however, have been developed that exploit the IP 
address structure. For example. Code Red II and Nimda worms 
have used localized scanning |[34l . |[35l . LocaUzed scanning 
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preferentially searches for vulnerable hosts in the local sub- 
network. The Blaster worm has used sequential scanning 
in addition to localized scanning ||37l - Sequential scanning 
searches for vulnerable hosts through their closeness in the 
IP address space. The AgoBot has employed a blacklist of the 
well-known monitored IP address space and avoided scanning 
these addresses to be stealthy (20]. A common characteristic of 
these malwares is that they scan for vulnerable hosts by taking 
a certain structure in the IP address space into consideration. 
Such a structure, as we shall soon show, exhibits network 
vulnerabilities to defenders and advantages to attackers. 

In this paper, we study the perspective of attackers who 
attempt to collect the information on network vulnerabilities 
and design intelligent malwares. By studying this perspective, 
we hope to help defenders better understand and defend 
against malware propagation. For attackers, an open question 
is how certain information can help them design fast-spreading 
malwares. The information may include the vulnerability on 
end hosts, the number of vulnerable hosts, the locations of 
detection systems, and the distributions of vulnerable hosts. 

This work focuses on vulnerable-host distributions. The 
vulnerable-host distributions have been observed to be bursty 
and spatially inhomogeneous by Barford et at. 0. A non- 
uniform distribution of Witty-worm victims has been reported 
by Rajab et al. ||T9l . A Web-server distribution has been 
found to be non-uniform in the IP address space in our 
prior work ||8l. These discoveries suggest that vulnerable hosts 
and Web servers may be "clustered" (i.e., non-uniform). The 
clustering/non-uniformity makes the network vulnerable since 
if one host is compromised in a cluster, the rest may be 
compromised rather quickly. Therefore, the information on 
vulnerable-host distributions can be critical for attackers to 
develop intelligent malwares. 

We refer the malwares that exploit the information on highly 
uneven distributions of vulnerable hosts as network-aware 
malwares. Such malwares include aforementioned localized- 
scanning and sequential-scanning malwares. In our prior work, 
we have studied importance-scanning malwares |8|, |7|, |14|. 
Specifically, importance scanning provides a "what-if sce- 
nario: When there are many ways for network-aware malwares 
to exploit the information on vulnerable hosts, importance 
scanning is a worst-case threat-model and can serve as a 
benchmark for studying real network-aware malwares. What 
has been observed is that real network-aware and importance- 
scanning malwares spread much faster than random-scanning 
malwares L19J . [SJ. This shows the importance of the problem. 
It is not well understood, however, how to characterize the 
relationship between the information on vulnerable-host distri- 
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butions and the propagation speed of network-aware malwares. 

Questions arise. Does there exist a generic characteris- 
tic across different vulnerable-host distributions? If so, how 
do network-aware malwares exploit such a vulnerability? 
How can we defend against such malwares? Our goal is to 
investigate such a generic characteristic in vulnerable-host 
distributions, to quantify its relationship with network-aware 
malwares, and to understand the effectiveness of defense 
strategies. To achieve this goal, we investigate network-aware 
malware attacks in view of information theory, focusing on 
both the worst-case and real network-aware malwares. 

A fundamental concept of information theory is the entropy 
that measures the uncertainty of outcomes of a random event. 
The reduction of uncertainty is measured by the amount of ac- 
quired information. We apply the Renyi entropy, a generalized 
entropy iBTl . to analyze the uncertainty of finding vulnerable 
hosts for different malware-scanning methods. This would 
relate malware-attacking methods with the information bits 
extracted by malwares from the vulnerable-host distribution. 

As the first step, we study, from five large-scale mea- 
surement sets, the common characteristics of non-uniform 
vulnerable-host distributions. Then, we derive a new metric as 
the non-uniformity factor to characterize the non-uniformity 
of a vulnerable-host distribution. A larger non-uniformity 
factor reflects a more non-uniform distribution of vulnerable 
hosts. We obtain the non-uniformity factors from the data 
sets on vulnerable-host distributions and show that all data 
sets have large non-uniformity factors. Moreover, the non- 
uniformity factor is a function of the Renyi entropies of order 
two and zero ||2TI . We show that the no n- uniformity factor 
better characterizes the unevenness of a distribution than the 
Shannon entropy. Therefore, in view of information theory, the 
non-uniformity factor provides a quantitative measure of the 
unevenness/uncertainty of a vulnerable-host distribution. 

Next, we relate the generalized entropy with network-aware 
scanning methods. The class of network-aware malwares that 
we study all utilizes randomized epidemic algorithms. Hence 
the importance of applying the generalized entropy is that the 
Renyi entropy characterizes the bits of information extractable 
by the randomized epidemic algorithms. Specifically, we ex- 
plicitly relate the Renyi entropy with the randomized epidemic 
scanning methods through analyzing the spreading speed of 
network-aware malwares at an early stage of propagation. A 
malware that spreads faster at the early stage can in general 
infect most of the vulnerable hosts in a shorter time. The prop- 
agation ability of a malware at the early stage is characterized 
by the infection rate [32|. We derive the infection rates of a 
class of network-aware malwares. We find that the infection 
rates of random-scanning and network-aware malwares are 
determined by the uncertainty of the vulnerable-host distri- 
bution or the Renyi entropies of different orders. Specifically, 
a random-scanning malware has the largest uncertainty (e.g., 
Renyi entropy of order zero), and an optimal importance- 
scanning malware has the smallest uncertainty {e.g., Renyi 
entropy with order infinity). Moreover, the infection rates 
of some real network-aware malwares depend on the non- 
uniformity factors or the Renyi entropy of order two. For 
example, compared with random scanning, localized scanning 



can increase the infection rate by nearly a non-uniformity 
factor. Therefore, the infection rates of malware-scanning 
algorithms are characterized by the Renyi entropies, relating 
the efficiency of a randomized scanning algorithm with the un- 
certainty on a non-uniform vulnerable-host distribution. These 
analytical results on the relationships between vulnerable-host 
distributions and network-aware malware spreading ability are 
validated by simulation. 

Finally, we study new challenges to malware defense posed 
by network-aware malwares. Using the non-uniformity factor, 
we show quantitatively that the host-based defense strategies, 
such as proactive protection [4] and virus throttling [27], 
should be deployed at almost all hosts to slow down network- 
aware malwares at the early stage. A partial deployment 
would nearly invalidate such host-based defense. Moreover, 
we demonstrate that the infection rate of a network-aware 
malware in the IPv6 Internet can be comparable to that of the 
Code Red v2 worm in the IPv4 Internet. Therefore, fighting 
network-aware malwares is a real challenge. 

The remainder of this paper is structured as follows. Section 
im introduces information theory and malware propagation. 
Section |lll] presents our collected data sets. Section |IV] for- 
mulates the problems studied in this work. Section |V] in- 
troduces a new metric called the non-uniformity factor and 
compares this metric to the Shannon entropy. Sections |Vl] 
and IVIII characterize the spreading ability of network-aware 
malwares through theoretical analysis and simulation. Section 
IVIIII further studies the effectiveness of defense strategies on 
network-aware malwares. Section BXl concludes this paper 

II. Preliminaries 

In this section, we provide the background on information 
theory and malware propagation. 

A. Renyi Entropy and Information Theory 

An entropy is a measure of the average information uncer- 
tainty of a random variable [11|. A general entropy, called the 
Renyi entropy lISTI . O, is defined as 

Hq{X) = log2 {Px{x))\ for q ^ 1, (1) 

where the random variable X is with probability distribution 
Px and alphabet X. The well-known Shannon entropy is a 
special case of the Renyi entropy, i.e., 

H{X)^\miH„{X). (2) 

9-»l 

It is noted that 

H^{X)^\ogM (3) 
where \X\ is the alphabet size, and 

i?oo(^) = -log2maxPx(a:) (4) 

where Hoo{X) is a result from lim^^oo Hq{X) and is called 
the min-entropy of X. In this paper, moreover, we are also 
interested in the Renyi entropy of order two, i.e., 

i?2(^) = -l0g2 5](Px(x))^ (5) 

xt^X 
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Comparing Ho{X), H{X), H2{X), and H^{X), we have the 
following theorem that has been proved in 1221 . ||5l . 
Theorem 1: 

H^(X)>H{X)>H2{X)>HUX) (6) 

with equality if and only if X is uniformly distributed over 
X. 

Information theory has been applied in a wide range of 
fields, such as communication theory, Kolmogorov complex- 
ity, and cryptography. A fundamental result of information 
theory is that data compression can be achieved by assigning 
short codewords to the most frequent outcomes of the data 
source and necessarily longer codewords to the less frequent 
outcomes 111]. 

B. Malware Propagation 

Similar to data compression, a smart malware that searches 
for vulnerable hosts can assign more scans to a range of 
IP addresses that contain more vulnerable hosts. Thus, the 
malware can reduce the number of scans for attacking a large 
number of vulnerable hosts. We call such a malware as a 
network-aware malware. In essence, network-aware malwares 
consider the network structure {i.e., an uneven distribution of 
vulnerable hosts) to speed up the propagation. 

Many network-aware malwares have been studied. For 
example, routable-scanning malwares select targets only in 
the routable address space, using the information provided 
by the BGP routing table [l29l, fi2\. Evasive worms exploit 
lightweight sampling to obtain the knowledge of live subnets 
of the address space and spread only in these networks $20\ . 
In our prior work, we have studied a class of worst-case mal- 
wares, called importance-scanning malwares, which exploit 
non-uniform vulnerable-host distributions in an optimal fash- 
ion [8J, |7 1. Importance scanning is developed from and named 
after importance sampling in statistics. Importance scanning 
probes the Internet according to an underlying vulnerable-host 
distribution. Such a scanning method forces malware scans 
on the most relevant parts of an address space and supplies 
an optimal scanning strategy. Furthermore, if the complete 
information of vulnerable hosts is known, an importance- 
scanning malware can achieve the top speed of infection and 
become flash worms 1261 . 

III. Measurements and Vulnerable-Host 
Distributions 

We begin our study by considering how significant the un- 
evenness of vulnerable-host distributions is. We use five large 
data sets to obtain empirical vulnerable-host distributions. 

A. Measurements 

DShield (Dl): DShield collects intrusion detection system 
(IDS) logs 136|. Specifically, DShield provides the information 
of vulnerable hosts by aggregating logs from more than 1,600 
IDSes distributed throughout the Internet. We further focus on 
the following ports that were attacked by worms: 80 (HTTP), 
135 (DCE/RPC), 445 (NetBIOS/SMB), 1023 (FTP servers and 



the remote shell attacked by W32.Sasser.E.Worm), and 6129 
(DameWare). 

iSinks (PI and CI): Two unused address space monitors 
run the iSink system ll30l . The monitors record the unwanted 
traffic arriving at the unused address spaces that include a 
Class A network (referred to as "Provider" or PI) and two 
Class B networks at the campus of the University of Wisconsin 
(referred to as "Campus" or CI) [2|. 

Witty-worm victims (Wl): A list of Witty-worm victims is 
provided by CAIDA |24|. CAIDA used a network telescope 
with approximate 2"^^ IP addresses to log the traffic of Witty- 
worm victims that are Internet security systems (ISS) products. 

Web-server list (W2): The IP addresses of Web servers were 
collected through UROULETTE (http://www.uroulette.com/). 
UROULETTE provides a random uniform resource locator 
(URL) generator to obtain a list of IP addresses of Web servers. 

The first three data sets (Dl, PI, and CI) were collected 
over a seven-day period from 12/10/2004 to 12/16/2004 and 
have been studied in 121 to demonstrate the bursty and spatially 
inhomogeneous distribution of malicious source IP addresses. 
The last two data sets (Wl and W2) have been used in our 
prior work ||8l to show the virulence of importance-scanning 
malwares. The summary of our data sets is given in Table H] 



TABLE I 
Summary of the data sets. 



Trace 


Description 


Number of unique source addresses 


Dl 


DShield 


7,694,291 


PI 


Provider 


2,355,150 


CI 


Campus 


448,894 


Wl 


Witty-worm victims 


55,909 


W2 


Web servers 


13,866 



B. Vulnerable-Host Distributions 

To obtain vulnerable-host group distributions, we use the 
classless inter-domain routing (CIDR) notation ifTsl . The In- 
ternet is partitioned into subnets according to the first I bits 
of IP addresses, i.e., II prefixes or II subnets. In this division, 
there are 2' subnets, and each subnet contains 2^^^^ addresses, 
where I € {0, 1,-- - ,32}. For example, when / = 8, the 
Internet is grouped into Class A subnets {i.e., /8 subnets); when 
I = 16, the Internet is partitioned into Class B subnets {i.e., 
116 subnets). 

We plot the complementary cumulative distribution func- 
tions (CCDF) of our collected data sets in /8 and /1 6 subnets 
in Figure [T] in log-log scales. CCDF is defined as the faction 
of the subnets with the number of hosts greater than a given 
value. Figure [T(a)| shows population distributions in /8 subnets 
for Dl, PI, CI, Wl, and W2, whereas Figure [T(b)l exhibits 
host distributions in /1 6 subnets for Dl with different ports 
(80, 135, 445, 1023, and 6129). Figure □ demonstrates a 
wide range of populations, indicating highly inhomogeneous 
address structures. Specifically, the relatively straight lines, 
such as W2 and Dl-135, imply that vulnerable hosts follow 
a power law distribution. Similar observations were given in 

CI, im, m, 1161, ii7i, m, m. 

Why is the vulnerable-host distribution highly non-uniform 
in the IPv4 address space? An answer to this question would 
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Fig. 1. CCDF of collected data sets. 

involve other empirical study beyond the scope of this work. 
Nevertheless, we hypothesize several possible reasons. First, 
no vulnerable hosts can exist in reserved or multicast address 
ranges I38i . Second, different subnet administrators make 
different use of their own IP address space. Third, a subnet 
intends to have many computers with the same operating 
systems and applications for easy management ||25l , ||6|. Last, 
some subnets are more protected than others Q, |fT9l . 

IV. Problem Formulation 

Motivated by the empirical study, we provide a problem 
formulation in this section. 

A. Characterization 

Let a vulnerable host be a host that can be infected by 
a malware. A vulnerable host can be either akeady infected 
or uninfected. In this work, we denote vulnerable hosts as 
uninfected vulnerable hosts. 

We consider aggregated vulnerable-host distributions. Let 
Z (0 < / < 32) be an aggregation level of IP addresses as 
defined in Section ITlI-BI For a given let N^^'' be the number 



of vulnerable hosts in II subnet i, where 1 < i < 2 . Let N be 
the total number of vulnerable hosts, where N — ^1 ■ 

Let (i) (i = 1, 2, • • ■ , 2') be the probability that a randomly 
chosen vulnerable host is in the i-th /I subnet. Then Pg\i) = 

and X]i=i (*) = 1- Thus, pg\iys denote the group 
distribution of vulnerable hosts in /I subnets. 

Now consider a malware or an adversary that searches 
for vulnerable hosts. An adversary often does not have the 
complete knowledge on the locations of vulnerable hosts. 
Hence malwares make a random guess on which / / subnets are 
likely to have most vulnerable hosts. This results in a class of 
randomized epidemic algorithms for malwares to scan subnets. 
Let qg\i) (i — 1,2, ■ ■ ■ , 2') be the probability that a malware 
scans the i-th /I subnet. As we shall see, qf^iys characterize 
how effectively a malware scans and thus hits vulnerable hosts. 

B. Examples 

Consider /8 subnets (/ ~ 8). As an extreme case if all 
vulnerable hosts are in subnet 123.0.0.0/8, p?Vl23) = 1 

(8) 

and pg (i) = for i ^ 123. Hence, in view of a network 
defender, the network would be extremely vulnerable since if 
the malware discovers this subnet, the malware could focus 
its scan accordingly and potentially find all vulnerable hosts. 
As another extreme case, if pg*^ {i) — ^ is uniform, in view 
of a network defender, it would be harder for the malware to 
find vulnerable hosts rapidly as they are uniformly distributed 
over all /8 subnets. 

In view of an adversary, whether and how vulnerable hosts 
can be discovered and scanned depends on a randomized 
algorithm utilized by the malware. In other words, an ad- 
versary can customize qg\iys to make the malware spread 
either faster or slower Specifically, for the first extreme 
case where all vulnerable subnets are concentrated in subnet 
123.0.0.0/8, ql^\i) = pf\i) would be the best choice, and 
the resulting malware would spread the fastest. But if the 
adversary makes a poor choice of (7g''(i)'s being uniform 
across /8 subnets, the resulting malware spread would be 
slow. Hence pg''(i)'s and qg\iys characterize the severity 
of network vulnerabilities and the virulence of randomized 
epidemic scanning algorithms, respectively. 

C. Problems 

We now consider network-aware malware attacks in view 
of information theory. If a malware obtains the partial infor- 
mation on vulnerable hosts (e.g., pg''(i)'s), it can extract the 
information (bits) to design randomize epidemic algorithms 
(e.g., qg\iys). A fundamental question is how we can relate 
the information on vulnerable-host distributions with the per- 
formance of randomized epidemic algorithms. Specifically, we 
intend to study the following questions: 

> What information-theoretical measure can be used to 
quantify the unevenness of vulnerable-host distributions? 

> How much information (bits) on vulnerable hosts can be 
extracted by randomized epidemic algorithms utilized by 
malwares? 
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• How good are practical randomized epidemic algorithms? 

• What is the effectiveness of defense methods on network- 
aware malwares? 

V. Non-Uniformity Factor 

In this section, we derive a simple metric, called the non- 
uniformity factor, to quantify the vulnerability, i.e., the non- 
uniformity of a vulnerable-host distribution. We show that the 
non-uniformity factor is a function of Renyi entropies. We 
then compare the non-uniformity factor with the well-known 
Shannon entropy. 

A. Definition and Property 

Definition: The non-uniformity factor in /I subnets is de- 
fined as 

/3«=2'^(p«W)'. (7) 



One property of /J*^'' is that 



(8) 



The above inequality is derived by the Cauchy-Schwarz in- 
equality. The equality holds if and only if p"P {i) = 2~\ for 
i = 1,2, ■ ■ ■ ,2K In other words, when the vulnerable-host 
distribution is uniform, Z?''^ achieves the minimum value 1. 



On the other hand, since p"g\i) > 



(9) 



The equality holds when p"g\j) — 1 for some j and 
p^g\i) = 0, i ^ j, i.e., all vulnerable hosts concentrate on one 
subnet. This means that when the vulnerable-host distribution 
is extremely non-uniform, obtains the maximum value 2'. 
Moreover, assuming that vulnerable hosts are uniformly dis- 
tributed in the first n (1 < n < 2') /I subnets, i.e., Pg\i) — ^, 
i — 1, 2, • • • ,n; and p'"g\i) — 0, i = n + 1, ■ ■ ■ ,2\ we have 
/3(') — 2-. Therefore, /3*^'' characterizes the non-uniformity of 
a vulnerable-host distribution. A larger non-uniformity factor 
reflects a more non-uniform distribution of vulnerable hosts. 

The non-uniformity factor is indeed related to a distance 
between a vulnerable-host distribution and a uniform distri- 
bution. Consider L2 distance between ^ (i) and the uniform 



How does vary with 17 When / = 0, /3(°) = 1. In the 
other extreme where / — 32, 



pf'H^) 



0, 



address i is vulnerable to the malware; 
otherwise, 

(12) 

which results in (3^^^^ ~ More importantly, the ratio of 
to lies between 1 and 2, as shown below. 

Theorem 2: 



(I) 



(13) 



where ? e {1, • • ■ ,32}. 

The proof of Theorem|2]is given in Appendix 1. An intuitive 
explanation of this theorem is as follows. For // and /(/ — 1) 
subnets, group i {i = 1,2,- •• ,2'"^) of /(/ — 1) subnets 
is partitioned into groups 2i — 1 and 2i of /I subnets. If 
vulnerable hosts in each group of /(/ — 1) subnets are equally 
divided into the groups of // subnets (i.e., p'g\2i — 1) = 
p^g\2i) = \p^g~^\i). Mi), then = If the division 

of vulnerable hosts is extremely uneven for all groups (i.e., 
p^g\2i - 1) = or p^g\2i) = 0, Mi), then = 2/3('"i). 
Excluding these two extreme cases, /J'^'^-'^' < /J^'^ < 2/3^'"^). 
Therefore, /J*^'' is a non-decreasing function of Moreover, 
the ratio of to /J*^'"^' reflects how unevenly vulnerable 
hosts in each /(Z — 1) subnet distribute between two groups 
of II subnets. This ratio is indicated by the slope of /3*^'^ in a 
/3(') - I figure. 

B. Estimated Non-Uniformity Factor 

Figure |2] shows the non-uniformity factors estimated from 
our data sets. The non-uniformity factors increase with the 
prefix length for all data sets. Note that the y-axis is in a log 
scale. Thus, /S^'^ increases almost exponentially with a wide 
range of I. To gain intuition on how large /S^'-* can be, /3*^^) 
and are summarized for all data sets in Table HI] It can 
be observed that /J^^^ and /J^^^' have large values, indicating 
the significant unevenness of collected distributions. 

TABLE II 

AND /3(1^' OF COLLECTED DISTRIBUTIONS. 





Dl 


PI 


CI 


Wl 


W2 




7.9 


8.4 


9.0 


12.0 


7.8 


/3(16) 


31.2 


43.2 


52.2 


126.7 


50.2 


/3(') 


Dl-80 


Dl-135 


D 1-445 


Dl-1023 


Dl-6129 




7.9 


15.4 


10.5 


48.2 


9.1 


/3(16) 


153.3 


186.6 


71.7 


416.3 


128.9 



distribution Pu\i) 



jr for i 



1,2, • • ■ ,2\ where 



which leads to 



/3(')=2'.|b«-p(')|b%l. 



(10) 



(11) 



For a given /, 2' is a constant that is the size of the sample 
space of /I subnets. Hence /J^'^ essentially measures the devi- 
ation of a vulnerable-host group distribution from a uniform 
distribution for // subnets. 



C. Shannon Entropy 

To further understand the importance of the non-uniformity 
factor, we now turn our attention on the Shannon entropy 
for comparison. It is well-known that the Shannon entropy 
can be used to measure the non-uniformity of a probability 
distribution ifTTl . The Shannon entropy in /I subnets is defined 
as 



H 



(z)l0g2P«(z), 



(14) 
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12 16 20 

Prefix length / 

(a) Five data sets. 




12 16 20 
Prefix length! / 

(b) Dl with different ports. 

Fig. 2. Non-uniformity factors of collected data sets. Note that the y-axis 
uses a log scale. 
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(a) Five data sets. 
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Prefix length / 

(b) Dl with different ports. 
Fig. 3. Shannon entropies of collected data sets. 



wherePW = {4')(l),p«(2),... ,p(')(20}. 
It is noted that 



0< H 



< I. 



(15) 



If a distribution is uniform, H (P^'^) achieves the maximum 
value I. On the other hand, if a distribution is extremely non- 
uniform, e.g., all vulnerable hosts concentrate on one subnet, 
H (pC)) obtains the minimum value 0. 

Furthermore, we compare iJ (pC)) with i/(p('-i)) and 
find that their difference is between and 1, as shown in the 
following theorem. 

Theorem 3: 



H 



(p('-i)) <i/(p«) <ff (p('-i') 



1, 



(16) 



where I e {!,■■■ ,32}. 

The proof of Theorem |3] is given in Appendix 2. If 
vulnerable hosts in each group of /(/ — 1) subnets are ex- 
tremely unevenly divided into the groups of // subnets (i.e., 
pl^\2i - 1) = orp'g\2i) = 0, Vi e {1,2,- ■ • ,2'-i}), then 
H (p(')) = H (p('"i)). If the division of vulnerable hosts is 



equal for all groups (i.e., p'p {2i — 1 



Vi)), then H (p(')) = H (p('-i)) + 1. Excluding these two 
extreme cases, (p('-i)) < H (p(')) < i?(p('-i)) + 1. 
Therefore, H (P^)) is a non-decreasing function of I. More- 
over, the difference between H (P''^) and H (P^'~^') reflects 
how evenly vulnerable hosts in each /(/ — 1) subnet distribute 
between two groups of /I subnets. 

Figure |3] shows the Shannon entropies of our empirical 
distributions from the data sets. H (P''^) — I is denoted by 
the diagonal line in the figure. It can be seen that the curves 
for our collected data sets are similar 

D. Non-Uniformity Factor, Renyi Entropy, and Shannon En- 
tropy 

To quantify the difference between the non-uniformity fac- 
tor and the Shannon entropy, we note that the non-uniformity 
factor directly relates to the Renyi entropies of order two and 
zero, as shown in the following equation: 

^ 2'-^^(^'") = 2"<p''')-Mp''')^ (17) 

where pC) {p^ {l),pP {2), ■ ■ ■ ,pP{2^)}. Therefore, the 
non-uniformity factor is essentially a Renyi entropy. Hence, 
the non-uniformity factor corresponds to a generalized entropy 
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of order 2, whereas the Shannon entropy is the generaHzed 
entropy of order 1. 

Why do we choose the non-uniformity factor rather than 
the Shannon entropy? We compare these two metrics in terms 
of characterizing a vulnerable-host distribution and find the 
following fundamental differences. 

• In Figure |2] when a distribution is uniform, /3^'^ = 1. 
Hence, the distance between /J^'^ and the horizontal 
access 1 measures the degree of unevenness. Similarly, 
the distance between H (P^'^) and in Figure |3] reflects 
how uniform a distribution is. A larger cor- 
responds to a more even distribution, whereas a larger 

corresponds to a more non-uniform distribution. 
In addition, if two distributions have different prefix 
lengths, we can directly apply the non-uniformity factor 
(or the Shannon entropy) to compare the unevenness (or 
evenness) between them. Therefore, the Shanon entropy 
provides a better measure for describing the evenness 
of a distribution, while the non-uniformity factor gives 
a better metric for characterizing the non-uniformity of a 
distribution. 

• From Theorem [T] and Equation ( [TtI i. we have > 

l — fjfpC)) (I) 

2 y ' when the non-zero s are not all equal. 
Meanwhile, evidenced by Figures |2] and [5] the non- 
uniformity factor magnifies the unevenness of a distri- 
bution. Therefore, /3''' depends more on the large Pg^'s. 
A more important aspect of using the non-uniformity factor 
is its relation to some real randomized epidemic algorithms 
{e.g., localized scanning and sequential scanning). Such a 
relation cannot be drawn using the Shannon entropy. We will 
show this in the next section. 

VI. Network- Aware Malware Spreading Ability 

In this section, we explicitly relate the speed of malware 
propagation with the information bits extracted by random- 
scanning and network-aware malwares. 

A. Collision Probability, Uncertainty, and Information Bits 

We begin with defining three important quantities: the col- 
lision probability, uncertainty, and information bits. Consider 
a randomly chosen vulnerable host Y . The probability that 
this host is in the /I subnet i is p"g\i). Imagine that a 
malware guesses which subnet host Y belongs to and chooses 
a target /I subnet i with the probability qg\i) = Pg\i). 
Thus, the probability for the malware to make a correct 

guess is p/i — (-Pg (*)) ■ This probability is called the 

collision probability and is defined in [S]- Such a probability 
of success is reflected in our designed non-uniformity factor 
and corresponds to a scenario that the malware knows the 
underlying vulnerable-host group distribution. Intuitively, the 
more non-uniform a vulnerable-host distribution is, the larger 
the probability of success is, i.e., the easier it is for a scan to 
hit a vulnerable host, the more vulnerable the network is, and 
the less uncertainty there is in a vulnerable-host distribution. 

We now extend the concept of the collision probability and 
define ph as the probability that a malware scan hits a subnet 



where a randomly chosen vulnerable host locates, i.e., 

2' 

P''=EH'H*)4''«- (18) 

4=1 

Then two important quantities can be defined: 

• — log2 Ph as the uncertainty exhibited by the vulnerable- 
host distribution pg''(i)'s. 

• Hq (P^'') — [— log2 Ph] as the number of information bits 
extracted by a randomized epidemic scanning algorithm 
using qg \iys. 

Here — logj Ph is regarded as the uncertainty on the 
vulnerable-host distribution in view of the malware, similar 
to self-information |39|. For example, if a malware has no 
information about a vulnerable-host distribution and has to use 
random scanning, it has the largest uncertainty Hq (P*-'-*) = I 
and extracts zero information bit from the distribution. Like- 
wise, the number of information bits extracted by a network- 
aware malware can be measured as the reduction of the 
uncertainty and thus equals to Hq (P*^'^) — [—\og2 Ph]- For 
example, logj Z?*'^ = Hq (p(')) -H2 (P*'^) is the information 
bits extractable by an adversary that chooses q^g\i) — Pg\i). 

B. Infection Rate 

We characterize the spread of a network-aware malware at 
an early stage by deriving the infection rate. The infection rate, 
denoted by a, is defined as the average number of vulnerable 
hosts that can be infected per unit time by one infected 
host during the early stage of malware propagation lf32l . The 
infection rate is an important metric for studying network- 
aware malware spreading ability for two reasons. First, since 
the number of infected hosts increases exponentially with the 
rate 1 + a during the early stage, a malware with a higher 
infection rate can spread much faster at the beginning and thus 
infect a large number of hosts in a shorter time ||8l. Second, 
while it is generally difficult to derive a close-form solution 
for dynamic malware propagation, we can obtain a close-form 
expression of the infection rate for different malware scanning 
methods. 

Let R denote the (random) number of vulnerable hosts that 
can be infected per unit time by one infected host during the 
early stage of malware propagation. The infection rate is the 
expected value of R, i.e., a = E[R]. Let s be the scanning rate 
or the number of scans sent by an infected host per unit time, 
N be the number of vulnerable hosts, and VI be the scanning 
space (i.e., U. — 2^"^). 

C. Random Scanning 

Random scanning (RS) has been used by most real worms. 
For RS, an infected host sends out s random scans per unit 
time, and the probability that one scan hits a vulnerable host is 
j^. Thus, R follows a Binomial distribution B(s, -^fl resulting 

^" sN 

ans = E[R] = — . (19) 

'In our derivation, we ignore the dependency of the events that different 
scans hit the same target at the early stage of malware propagation. 
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Another way to derive the infection rate of RS is as 
follows. Consider a randomly chosen vulnerable host Y. The 
probability that this host is in the /I subnet i is pg\i). An RS 
malware can make a successful guess on which subnet host Y 
belongs to with collision probability Ph — — 2~^°(^' '). 
A scan from the RS malware can be regarded as first selecting 
the /I subnet randomly and then choosing the host in the subnet 
at random. Hence the probability for the malware to hit host 
r is ^ . = 2-Mp"')-(^^-^). Since there are 

N vulnerable hosts, the probability for a malware to hit a 
vulnerable host is N ■ 2-^«('P"')-(32-0_ xhus, R follows a 
Binomial distribution B(s, AT . 2--f^"(^'")-(32-0)^ resulting in 



aBS ^ E[R] = 



sN 

232^ 



• 2 



(20) 



Therefore, for the RS malware, the uncertainty on the 
vulnerable-host distribution is — \0g2Ph ~ Ho {P^'''), '-e., the 
number of information bits on vulnerable hosts extracted by 

RS is Ho (P(')) - Ho (P(')) = 0. 

D. Optimal Importance Scanning 

Importance scanning (IS) exploits the non-uniform distri- 
bution of vulnerable hosts. We derive the infection rate of 
IS. An infected host scans /I subnet i with the probability 
q^Pii)- Consider a randomly chosen vulnerable host Y. The 
probability for this host being in II subnet i is pg\i). An IS 
malware can make a successful guess on which subnet host Y 
belongs to with collision probability ph — Pg i'^)lg (*)■ 
Thus, the probability for the malware to hit the host Y 
is 2^J2f=iPg\i)qg\i)- Similar to RS, R of IS follows 



a Binomial distribution B(s,2m^ "* ^ (*))' which 

leads tc|l 

2' 

„ AT /■ 

'^IS = E[R] = -^J2P^P{^)qf\^). (21) 



Therefore, the uncertainty of the vulnerable-host distribution 

^2^ 

for an IS malware is — logj X]i=i Pg (Ow (*)' ^i^d '^^e num- 
ber of information bits on vulnerable hosts extracted by IS is 

Note that importance scanning can choose gg'(i)'s to max- 
imize the infection rate, resulting in a "worst-case" scenario 
for defenders or II optimal IS (II OPTJS) for attackers ISJ, 



(0 

^OPT_IS 



sN 

max{a/s} = ^55^7 max{p^'^ (i)}. (22) 



That is, 

(0 

^OPT_IS 



232-r 



(23) 

Therefore, the uncertainty on the vulnerable-host distribution 
for // OPTJS is Hao (P^'^); and the number of infection 
bits on vulnerable hosts extracted by this scanning method 

isHo(P('))-i/oo(P»). 

^The same result was derived in [S] but by a different approach. 



E. Suboptimal Importance Scanning 

As shown in our prior work IH, the optimal IS is difficult to 
implement in reality. Hence we also consider a special case of 
IS, where the group scanning distribution qg \i) is chosen to 
be proportional to the number of vulnerable hosts in group i, 
i.e., qg\i) = pg\i). This results in suboptimal IS |8|, called 
II IS. Thus, the infection rate is 



(0 _ 
^is 



sN 



232 



iPgi^yf = ^-^-"'^'"'^=-ns-P^'l 



232-i 



(24) 

Therefore, the uncertainty on the vulnerable-host distribution 
for // IS is H2 (P*^''); and the corresponding number of infor- 
mation bits extracted is Ho (P^'^) - H2 (P''^) or logj/?^'^. 
Moreover, compared with RS, this II IS can increase the 
infection rate by a factor of l3^^K On the other hand, RS can 
be regarded as a special case of suboptimal IS when I — 0. 

F. Localized Scanning 

Localized scanning (LS) has been used by such real worms 
as Code Red II and Nimda. LS is a simpler randomized 
algorithm that utilizes only a few parameters rather than 
an underlying vulnerable-host group distribution. We first 
consider a simplified version of LS, called II LS, which scans 
the Internet as follows: 

• Pa (0 < Pa < 1) of the time, an address with the same 
first I bits is chosen as the target, 

• 1 — Pa of the time, an address is chosen randomly from 
an entire IP address space. 

Hence LS is an oblivious yet local randomized algorithm 
where the locality is characterized by parameter pa- Assume 
that an initially infected host is randomly chosen from the 
vulnerable hosts. Let Ig denote the subnet where an initially 
infected host locates. Thus, P(/g = t) = Pg\i), where 
i = 1, 2, • • • ,2'. For an infected host located in II subnet i, 
a scan from this host probes globally with the probability of 
1 — Pa and hits II subnet j (j ^ i) with the likelihood of ""^"f" . 
Thus, the group scanning distribution for this host is 



Pa + 

1-Pa 



if j = i; 
otherwise, 



(25) 



where j = 1, 2, • • • , 2 . Given the subnet location of an ini- 
tially infected host (i.e., II subnet i), the conditional collision 
probability or the probability for a malware scan to hit a 
randomly chosen vulnerable host can be calculated based on 
Equation ( fTsT i, i.e., 

i-Pa 



Ph{i) ^ PaP^gKi) . 2i 



(26) 



Therefore, we can compute the collision probability as 

2' 2' 

Ph 

1=1 

resulting in 



P{Ig = = PaY^plii) + (27) 



2' 



= ^RS i^-Pa+Pa 



(28) 
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Therefore, the number of information bits extracted from the 
vulnerable-host distribution by II LS is log2{l —Pa+PaP''^^}, 
which is between and Ho (P^'^) - H2 (P''^). 

Moreover, since > 1 {(3'^^^ = 1 is for a uniform 

distribution and is excluded here), a^l increases with respect 
to Pa- Specifically, when pa — > 1, a-^g ctfls/?^'^ — ct-jg. 
Thus, II LS has an infection rate comparable to that of II IS. 
In reality, pa cannot be 1. This is because an LS malware 
begins spreading from one infected host that is specifically in 
a subnet; and if pa = 1, the malware can never spread out of 
this subnet. Therefore, we expect that the optimal value of pa 
should be large but not I. 

Next, we further consider another LS, called two-level LS 
(2LLS), which has been used by the Code Red II and Nimda 
worms 1341 . ifJS i 2LLS scans the Internet as follows: 

• (0 < Pb < 1) of the time, an address with the same 
first byte is chosen as the target, 

• Pc (0 < Pc < 1 — Pb) of the time, an address with the 
same first two bytes is chosen as the target, 

• 1 — Pb — pc of the time, a random address is chosen. 
For example, for the Code Red II worm, pi, — 0.5 and pc ~ 
0.375 L34I; for the Nimda worm, pt = 0.25 and p^ = 0.5 
II35I . Using the similar analysis for II LS, we can derive the 
infection rate of 2LLS: 

a2LLS = aRS (1 - Pb - Pc + Pb/3^'^ + Pc/3^''') • (29) 

Similarly, the number of information bits extracted from 
the vulnerable-host distribution by the 2LLS malware is 
log2{l ^ Pb — Pc + PbP'"^^ + PcP^^^^}, which is between 
andiJo(P(i^)) -if2(P(^^)). 

Since P^^^^ > /3(8) > 1 from Theorem |2] aaiiS holds 
or increases when both pb and pc increase. Specially, when 

Pc 1, a2LLS ctR.sP^^^'' — Q^/ls^- Thus, 2LLS has an 
infection rate comparable to that of 116 IS. Moreover, P^^^'' is 
much larger than /3^*^ as shown in Table for the collected 
distributions. Hence, pc is more significant than pb for 2LLS. 

G. Modified Sequential Scanning 

The Blaster worm is a real malware that exploits se- 
quential scanning in combination with localized scanning. A 
sequential-scanning malware studied in (33], |TT| begins to 
scan addresses sequentially from a randomly chosen starting 
IP address and has a similar propagation speed as a random- 
scanning malware. The Blaster worm selects its starting point 
locally as the first address of its Class C subnet with probabil- 
ity 0.4 ||37l . Il33l . To analyze the effect of sequential scanning, 
we do not incorporate localized scanning. Specifically, we 
consider our II modified sequential-scanning (MSS) malware, 
which scans the Internet as follows: 

• Newly infected host A begins with random scanning until 
finding a vulnerable host with address B. 

• After infecting the target B, host A continues to sequen- 
tially scan IP addresses B + 1, B + 2, ■ ■ ■ (or B - 1, 
B — 2, • • ■ ) in the // subnet where B locates. 

Such a sequential malware-scanning strategy is in a similar 
spirit to the nearest neighbor rule, which is widely used 



in pattern classification ifTOl . The basic idea is that if the 
vulnerable hosts are clustered, the neighbor of a vulnerable 
host is likely to be vulnerable also. 

Such a // MSS malware has two stages. In the first stage 
(called MSS_1), the malware uses random scanning and has 
an infection rate of ajis, i-e., aMss_i ~ ors- In the second 
stage (called MSS_2), the malware scans sequentially in a II 
subnet. The fist / bits of a target address are fixed, whereas 
the last 32 — / bits of the address are generated additively or 
subtractively and are modulated by 2^^~'. Let /„ denote the 
sunbet where B locates. Thus, P{Ig — i) — pg\i), where 
i — 1,2,- •• ,2'. Since an MSS_2 malware scans only the 
subnet where B locates, the conditional collision probability 

Ph{i) = Pg\i), which leads to ph = X^Li (pg\i)) ■ Thus, 
the infection rate is 

aMSS_2 = aRs ■ P'^^l (30) 

Therefore, the uncertainty on vulnerable hosts for II MSS is 
between Hq (P''') and H2 (P*-'^) . Moreover, the infection rate 
of // MSS is between a_Rs and ausP^^^ ■ 

H. Summary 

In summary, the uncertainty on the vulnerable-host dis- 
tribution and the corresponding number of information bits 
extracted by different randomized epidemic algorithms de- 
pends on the Renyi entropies of different orders, as shown 
in Table |lll] Moreover, the number of the information bits 
extractable by the network-aware malwares bridges the entropy 
on a vulnerable-host distribution and the malware propagation 
speed, as shown in the following equation. 

Information bits = Hq (P*'M — [— logop/i] = log, \ — ^ \ , 

^ ' L OLRS J 

(31) 

where ph is the collision probability and a is the infection rate 
of the malware. 

When II subnets are considered, RS has the largest uncer- 
tainty iJo (P^'-*), while optimal IS has the smallest uncertainty 
Hao (P*^'-*). Moreover, infection rates of all three network- 
aware malwares (suboptimal IS, LS, and MSS) can be far 
larger than that of an RS malware, depending on the non- 
uniformity factors {i.e., /3*^'-') or the Renyi entropy in the 
order of two {i.e., H2 (P^'-')). The infection rates of all these 
scanning algorithms are characterized by the Renyi entropies, 
relating the efficiency of a randomized scanning algorithm 
with the information bits in a non-uniform vulnerable-host 
distribution. 

Hence we relate the information theory with the network- 
aware malware propagation through the Renyi entropy. The 
uncertainty of a vulnerable-host group probability distribution, 
which is quantified by the Renyi entropy, is important for an 
attacker to design a network-aware malware. If there is no 
uncertainty about the distribution of vulnerable hosts {e.g., 
either all vulnerable hosts are concentrated on a subnet or 
all information about vulnerable hosts is known), the entropy 
is minimum, and the malware that uses the information on 
the distribution can spread fastest by employing the optimal 
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TABLE III 

Summary of Randomized Epidemic Scanning Algorithms and Information Bits 



Algorithm 


Infection Rate 


Information Bits Extracted 


RS 

11 opt_is 

11 IS 

n Ls 

2LLS 
11 MSS_2 


TT 1 nil) \ 

232-1 ^ 

ctRS - 2 \ ' \ ' 
aRs-pV) 

aps (l - Pa +Pa/3<'') 

aps (l - Pi, - Pc + Pi,/3(«) + Pc/9(l«)) 




Ho(p('))-Hoo(pW) 
log2{/3(''} 

log2{l- Pa +Pa/3('n 

log2{l-Pi,-Pc+Pi,/3W +Pc/3(16)} 
log2{/3("} 



importance scanning. On the other hand, if there is maximum 
uncertainty (e.g., vulnerable hosts are uniformly distributed), 
the entropy is maximum. For this case, the best a malware 
can take an advantage from a uniform distribution is to use 
random scanning. In general, when an attacker obtains more 
information about a non-uniform vulnerable-host distribution, 
the resulting malware can spread faster. 

VII. Simulation and Validation 

We now validate our analytical results through simulations 
on the collected data sets. 

A. Infection Rate 

We first focus on validating infection rates. We apply the 
discrete event simulation to our experiments |23|. Specifically, 
we simulate the searching process of a malware using different 
scanning methods at the early stage. We use the CI data set for 
the vulnerable-host distribution. Note that the CI distribution 
has the non-uniformity factors (3^^'^ = 9.0 and (3^^^'^ ^ 52.2, 
and max,j{pg ^(i)} — 0.0041. The malware spreads over the 
CI distribution with N = 448,894 and has a scanning rate 
s — 100. The uncertainty on the vulnerable-host distribution 
and the information bits extractable for different scanning 
methods are shown in Table |IV] The simulation stops when 
the malware has sent out 10"^ scans for RS, /16 OPT_IS, /16 
IS, /16 LS, and 2LLS, and 65,535 scans for /16 MSS_2. Then, 
we count the number of vulnerable hosts hit by the malware 
scans and compute the infection rate. The results are averaged 
over 10"* runs. Table HV] compares the simulation results (i.e., 
sample mean) with the analytical results (i.e.. Equations ( |20] l, 
(|22]l. (EUl, (|28]l, (|29ll, and Here, a /16 LS malware uses 

Pa = 0.75, whereas a 2LLS malware employs pi, — 0.25 and 
Pc = 0.5. We observe that the sample means and the analytical 
results are almost identical. 

We observe that network-aware malwares have much larger 
infection rates than random-scanning malwares. LS indeed 
increases the infection rate with nearly a non-uniformity 
factor and approaches the capacity of suboptimal IS. This 
is significant as LS only depends on one or two parameters 
(i.e.. Pa for // LS and pi,, pc for 2LLS), while IS requires the 
information of the vulnerable-host distribution. On the other 
hand, LS has a larger sample variance than IS as indicated 
by Table |IV] This implies that the infection speed of an LS 
malware depends on the location of initially infected hosts. 
If the LS malware begins spreading from a subnet containing 
densely populated vulnerable hosts, the malware would spread 



rapidly. Furthermore, we notice that the MSS malware also has 
a large infection rate at the second stage, indicating that MSS 
can indeed exploit the clustering pattern of the distribution. 
Meanwhile, the large sample variance of the infection rate of 
MSS_2 reflects that an MSS malware strongly depends on the 
initially infected hosts. We further compute the infection rate 
of a /16 MSS malware that includes both random-scanning and 
sequential-scanning stages. Simulation results are averaged 
over 10^ runs and are summarized in Table [V] These results 
strongly depend on the total number of malware scans. When 
the number of malware scans is small, an MSS malware 
behaves similar to a random-scanning malware. When the 
number of malware scans increases, the MSS malware spends 
more scans on the second stage and thus has a larger infection 
rate. 



TABLE V 

Infection rates of a /16 MSS malware. 



# of malware scans 


10 


100 


1000 


10000 


50000 


Sample mean 
Sample variance 


0.0108 
0.1246 


0.0190 
0.1346 


0.0728 
0.1659 


0.2866 
0.2498 


0.4298 
0.2311 



B. Dynamic Malware Propagation 

An infection rate only characterizes the early stage of mal- 
ware propagation. We now employ the analytical active worm 
propagation (AAWP) model and its extensions to characterize 
the entire spreading process of malwares |6|. Specifically, the 
spread of RS and IS malwares is implemented as described 
in |8], whereas the propagation of LS malwares is modeled 
according to |fT9| . The parameters that we use to simulate a 
malware are comparable to those of the Code Red v2 worm. 
Code Red v2 has a vulnerable population A'^ = 360, 000 and 
a scanning rate s = 358 per minute f3T1 . We assume that 
the malware begins spreading from an initially infected host 
that is located in the subnet containing the largest number of 
vulnerable hosts. 

We first show the propagation speeds of network-aware 
malwares for the same vulnerable-host distribution from data 
set Dl-80. From Section [VT] we expect that a network-aware 
malware can spread much faster than an RS malware. Figure |4] 
demonstrates such an example on a malware that uses different 
scanning methods. It takes an RS malware 10 hours to infect 
99% of vulnerable hosts, whereas a /8 LS malware with 
Pa — 0.75 or a /8 IS malware takes only about 3.5 hours. 
A /16 LS malware with = 0.75 or a 2LLS malware with 
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TABLE IV 

Uncertainty on the vulnerable-host distribution, information bits, and infection rates of different scanning methods. 



Scanning method 


RS 


/16 OPT_IS 


/16 IS 


/16 LS 


2LLS 


/16 MSS_2 


Uncertainty (analytical result) 


16 


7.9266 


10.2940 


10.6999 


11.1620 


10.2940 


Information bits (analytical result) 





8.0734 


5.7060 


5.3001 


4.8380 


5.7060 


Infection rate (analytical result) 


0.0105 


2.8152 


0.5456 


0.4118 


0.2989 


0.5456 


Infection rate (sample mean) 


0.0103 


2.7745 


0.5454 


0.4023 


0.2942 


0.5489 


Infection rate (sample variance) 


0.0010 


0.2597 


0.0543 


0.2072 


0.1053 


0.3186 




/16 IS 




2LLS with p^= 


0.25 and p =0.5 


/16LSwithp 


=0.75 


- - /8 IS 




_^ /8 LS with p = 


0.75 


-M- RS 






0.5 



Fig. 4. A network-aware malware spreads over the Dl-80 distribution. 
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Time (second) 



Fig. 5. A 2LLS malware spreads over different distributions. 



Pb — 0.25 and pc — 0.5 can further reduce the time to 1 hour. 
A /16 IS malware spreads fastest and takes only 0.5 hour 

We also study the effect of vulnerable-host distributions on 
the propagation of network-aware malwares. From Table |ll] 

we observe that P^ot-wis > (^wi > (^c? > (^di- Thus, 
we expect that a network-aware malware using the /1 6 Dl- 
1023 distribution would spread faster than using other three 
distributions. Figure |5] verifies this through the simulations of 
the spread of a 2LLS malware that uses different vulnerable- 
host distributions (i.e., Dl-1023, Wl, CI, and Dl). Here, the 
2LLS malware employs the same parameters as the Nimda 
worm, i.e., p^ = 0.25 and p^ = 0.5. As expected, the malware 
using the Dl-1023 distribution spreads fastest, especially at 
the early stage of malware propagation. 



VIII. Effectiveness of defense strategies 

What are new requirements and challenges for a defense 
system to slow down the spread of a network-aware malware? 
We study the effectiveness of defense strategies through non- 
uniformity factors. 



A. Host-Based Defense 

Host-based defense has been widely used for random- 
scanning malwares. Proactive protection and virus throttling 
are examples of host-based defense strategies. 

A proactive protection (PP) strategy proactively hardens a 
system, making it difficult for a malware to exploit vulnera- 
bilities [4J. Techniques used by PP include address-space ran- 
domization, pointer encryption, instruction-set randomization, 
and password protection. Thus, a malware requires multiple 
trials to compromise a host that implements PP. Specifically, 
let p (0 < p < 1) denote the protection probability or the 
probability that a single malware attempt succeeds in infecting 
a vulnerable host that implements PP. On the average, a 
malware should make i exploit attempts to compromise the 
target. We assume that hosts with PP are uniformly deployed 
in the Internet. Let d (0 < d < 1) denote the deployment ratio 
of the number of hosts with PP to the total number of hosts. 

To show the effectiveness of the PP strategy, we consider 
the infection rate of a II IS malware. Since now some of the 
vulnerable hosts implement PP, Equation ( l24b changes to 



''IS 



sN 

232^ 



^L(pWW)' + (l-d)(p«W)' 



i=l 



aRsP^'\l-d + dp). 



(32) 



To slow down the spread of a suboptimal IS malware to that 
of a random-scanning malware, [3^^\\~d+dp) < 1, resulting 
in 

1 - (1 - 



P< 



(33) 



When PP is fully deployed, i.e., d — 1, p can be at most 
^Ijy. On the other hand, if PP provides perfect protection, i.e., 
p ^ 0, d should be at least 1 — Therefore, when is 
large. Inequality (1331 ) presents high requirements for the PP 
strategy. For example, if /J^^^^ — 50 (most of /J^^^^'s in Table 
In] are lai-ger than this value), p < 0.02 and d > 0.98. That is, 
a PP strategy should be almost fully deployed and provide a 
nearly perfect protection for a vulnerable host. 

We extend the model described in |8| to characterize the 
spread of suboptimal IS malwares under the defense of the PP 
strategy and show the results in Figure |6l Here, Code-Red-v2- 
like malwares spread over the CI distribution with (3^-^^^ — 
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Fig. 6. A /16 IS malware spreads under the defense of PP. 

52.2. It is observed that even when the protection probability 
is small (e.g., p ~ 0.01) and the deployment ratio is high 
[e.g., d — 0.8), a /16 IS malware is slowed down a little at the 
early stage, compared with a /16 IS malware without the PP 
defense (i.e., p = I and d = 0). Moreover, when p is small 
i^-g-, P < 0.02), d is a more sensitive parameter than p. 

We next consider the virus throttling (VT) strategy ll27l . VT 
constrains the number of outgoing connections of a host and 
can thus reduce the scanning rate of an infected host. We find 
that Equation ( |32] | also holds for this strategy, except that p is 
the ratio of the scanning rate of infected hosts with VT to that 
of infected hosts without VT. Therefore, VT also requires to 
be almost fully deployed for fighting network-aware malwares 
effectively. 

From these two strategies, we have learned that an effective 
strategy should reduce either aj^s or Z?^'-' . Host-based defense, 
however, is limited in such capabilities. 

B. IPv6 

IPv6 can decrease aus significantly by increasing the 
scanning space |32| . But the non-uniformity factor would 
increase the infection rate if the vulnerable-host distribution 
is still non-uniform. Hence, an important question is whether 
IPv6 can counteract network-aware malwares when both urs 
and are taken into consideration. 

We study this issue by computing the infection rate of a 
network-aware malware in the IPv6 Internet. As pointed out by 
O, a smart malware can first detect some vulnerable hosts in 
/64 subnets containing many vulnerable hosts, then release to 
the hosts on the hitlist, and finally spread inside these subnets. 
Such a malware only scans the local /64 subnet. Thus, we 
focus on the spreading speed of a network-aware malware 
in a /64 subnet. From Figure |2l we extrapolate that /S^'^^) in 
the IPv6 Internet can be in the order of 10^ if hosts are still 
distributed in a clustered fashion. Using the parameters N = 
10^ proposed by IfTSl and s = 4, 000 used by the Slammer 
worm [17], we derive the infection rate of a /32 IS malware in 
a /64 subnet of the IPv6 Internet: ^ . ^(32) ^ 2.2 x 

lO^'^. ajg^^ is larger than the infection rate of the Code Red 



v2 worm in the IPv4 Internet, where agf = 36Q.ooox358/60 ^ 
5 X 10-*. 

Therefore, IPv6 can only slow down the spread of a 
network-aware malware to that of a random-scanning malware 
in IPv4. To defend against the malware effectively, we should 
further consider how to slow down the increase rate of Z?*^'^ as I 
increases when IPv4 is updated to IPv6. In essence, we should 
reduce the information bits extractable by the network-aware 
malwares from the vulnerable-host distribution. 

IX. Conclusions 

In this paper, we have first obtained and characterized 
empirical vulnerable-host distributions, using five large mea- 
surement sets from different sources. We have derived a simple 
metric, known as the non-uniformity factor, to quantify an 
uneven distribution of vulnerable hosts. The non-uniformity 
factors have been obtained on the empirical vulnerable-host 
distributions using our collected data, and all of which demon- 
strate large values. This implies that the non-uniformity of 
the vulnerable-host distribution is significant and seems to 
be consistent across networks and applications. Moreover, 
the non-uniformity factor, shown as a function of the Renyi 
entropies of order two and zero, better characterizes the uneven 
feature of a distribution than the Shannon entropy. 

We have drawn a relationship between Renyi entropies 
and randomized epidemic scanning algorithms. In particular, 
we have quantified the spreading ability of network-aware 
malwares that utilize randomized scanning algorithms at the 
early stage. These randomized malware-scanning algorithms 
range from optimal randomized scanning (e.g., importance 
scanning) to real malware scanning (e.g., localized scanning). 
We have derived analytical expressions relating the infection 
rates of network-aware malwares with the uncertainty (i.e., 
Renyi entropy) of finding vulnerable hosts. We have derived 
and empirically verified that localized scanning and modified 
sequential scanning can increase the infection rate by nearly 
a non-uniformity factor when compared to random scanning 
and thus approach the capacity of suboptimal importance 
scanning. As a result, we have bridged the information bits 
extracted by malwares from a vulnerable-host distribution with 
the propagation speed of network-aware malwares. 

Furthermore, we have evaluated the effectiveness of several 
commonly used defense strategies on network-aware mal- 
wares. The host-based defense, such as proactive protection 
or virus throttling, requires to be almost fully deployed to 
slow down malware spreading at the early stage. This implies 
that host-based defense would be weakened significantly by 
network-aware scanning. More surprisingly, different from 
previous findings, we have shown that network-aware mal- 
wares can be zero-day malwares in the IPv6 Internet if 
vulnerable hosts are still clustered. These findings present a 
significant challenge to malware defense: Entirely different 
strategies may be needed for fighting against network-aware 
malwares. 

The information-theoretical view of malware attacks pro- 
vides us a quantification and a better understanding of three 
aspects: a non-uniform vulnerable-host distribution character- 
ized by the non-uniformity factor or the Renyi entropy of order 
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two, randomized malware-scanning algorithms characterized 
by the infection rate or the Renyi entropy of different orders, 
and the effectiveness of defense strategies. 

As part of our ongoing work, we plan to study in more 
depth relationships between information theory and dynamic 
malware attacks for developing effective detection and defense 
systems that would take vulnerable-host distributions into 
consideration. 
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APPENDIX 

APPENDIX 1: PROOF OF Theorem[2] 

Proof: Group i (i = 1,2, • • • ,2'"^) of /{I - 1) subnets is 
partitioned into groups 2i — 1 and 2i of /I subnets. Thus, 

Pr'H*)=P^'^(2*-l)+H'H2z), (A-1) 

where z 1,2,--- ,2*"!. Then, is related to /jC^^) by 
the Cauchy-Schwarz inequality. 



/3« 



i=i 

2' 



^(p«(2(^-l)+J))' 



^ 2'-!^ 5:p(')(2(z-l)+j) 

1=1 \j=l 
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The equality holds when Pg^(2 



(A-2)we have 



1) 



Hi) 



i = 1,2,- •■ ,2'"^ That is, in each /{I - 1) subnet, the 
vulnerable hosts are uniformly distributed in two groups of 
/I subnets. 

On the other hand, since 



(p(')(2z- 



we have 



(pW(2. 



(A-3) 



2 

X]i=i 



= 2 
< 2. 



(p^')(2z-l))' + (4')(2z))^ 



(A-4) 



The equality holds when for Vi, pg •* (2i — 1) = or (2«) = 
0. That is, in each /{I — 1) subnet, the vulnerable hosts 
are extremely non-uniformly distributed in two groups of /I 
subnets. ■ 



APPENDIX 2: Proof of Theorem[3] 
Proof: Since 

f^r'HO = EH''(2(* - 1) + j) > P^'n2(*-l)+j), (A-5) 
where i = 1, 2, ■ • ■ , 2'^^, we have 



2'-^ 2 



E Ep«(2(z - 1) + j) log,p«(2(* - 1) + j) 



i=i i=i 

nl-l 



2'-^ 2 

^ - E E4'n2(* - 1) + j) iog24'-^H*) 

i=l j=l 



(A-6) 



The equality holds when for Vi, Pg\2i — 1) = or pg\2i) = 
0. That is, in each /{I — 1) subnet, the vulnerable hosts 
are extremely non-uniformly distributed in two groups of /I 
subnets. 

On the other hand, using the log-sum inequality, 

2 

EH'^2(^-l)+J)log2P«(2(^-l)+J) 

- I E^g (2(«- l)+j) I log2^ ^ , 



H 



2'-i 2 

E E4'n2(* - 1) + j) log2P«(2(z - 1) + j) 

2'-i 



<-E(^rno)iog2^ 



= H P 



(A-8) 



The equality holds when for \fi, Pg \2i — 1 



4')(2z) 



-2-^ . That is, in each /{I — 1) subnet, the vulnerable hosts 

are uniformly distributed in two groups of // subnets. ■ 
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